Reducing false positives in molecular pattern recognition.
نویسندگان
چکیده
In the search for new cancer subtypes by gene expression profiling, it is essential to avoid misclassifying samples of unknown subtypes as known ones. In this paper, we evaluated the false positive error rates of several classification algorithms through a 'null test' by presenting classifiers a large collection of independent samples that do not belong to any of the tumor types in the training dataset. The benchmark dataset is available at www2.genome.rcast.u-tokyo.ac.jp/pm/. We found that k-nearest neighbor (KNN) and support vector machine (SVM) have very high false positive error rates when fewer genes (<100) are used in prediction. The error rate can be partially reduced by including more genes. On the other hand, prototype matching (PM) method has a much lower false positive error rate. Such robustness can be achieved without loss of sensitivity by introducing suitable measures of prediction confidence. We also proposed a cluster-and-select technique to select genes for classification. The nonparametric Kruskal-Wallis H test is employed to select genes differentially expressed in multiple tumor types. To reduce the redundancy, we then divided these genes into clusters with similar expression patterns and selected a given number of genes from each cluster. The reliability of the new algorithm is tested on three public datasets.
منابع مشابه
Detection of burial mounds in high-resolution satellite images of agricultural land
Many archaeological sites are discovered during building and road construction work, prompting full excavations and delay in construction. In order to detect more cultural heritage sites in advance of construction work, the Norwegian Directorate for Cultural Heritage has taken an initiative to develop tools for early detection of potential cultural heritage sites in satellite images. The presen...
متن کاملKeshmesh: Bringing Advanced Static Analysis to Concurrency Bug Pattern Detectors
Bug patterns are coding idioms that may make the code less maintainable or turn into bugs in future. The state-of-the-art tools for detecting concurrency bug patterns (CBPs) perform simple, intraprocedural analyses. While this simplicity makes the analysis fast, it does not provide protection against CBPs that involve aliasing or multiple methods. This paper introduces a practical and extensibl...
متن کاملRegion-based Mixture of Gaussians modelling for foreground detection in dynamic scenes
One of the most widely used techniques in computer vision for foreground detection is to model each background pixel as a Mixture of Gaussians (MoG). While this is effective for a static camera with a fixed or a slowly varying background, it fails to handle any fast, dynamic movement in the background. In this paper, we propose a generalised framework, called regionbased MoG (RMoG), that takes ...
متن کاملReduction of false positives in structure-based virtual screening when receptor plasticity is considered.
Structure-based virtual screening for selecting potential drug candidates is usually challenged by how numerous false positives in a molecule library are excluded when receptor plasticity is considered. In this study, based on the binding energy landscape theory, a hypothesis that a true inhibitor can bind to different conformations of the binding site favorably was put forth, and related strat...
متن کاملImaging in gynaecology: How good are we in identifying endometriomas?
AIM To evaluate the performance of subjective evaluation of ultrasound findings (pattern recognition) to discriminate endometriomas from other types of adnexal masses and to compare the demographic and ultrasound characteristics of the true positive cases with those cases that were presumed to be an endometrioma but proved to have a different -histology (false positive cases) and the endometrio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome informatics. International Conference on Genome Informatics
دوره 14 شماره
صفحات -
تاریخ انتشار 2003